A flexible, scalable finite-state transducer architecture for corpus-based concatenative speech synthesis

نویسندگان

  • Jon R. W. Yi
  • James R. Glass
  • I. Lee Hetherington
چکیده

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synthesis costs into a constraint kernel, we have obtained a topology that scales linearly with the size of the synthesis corpus. The FST representation provides a flexible, unified framework in which we can leverage our previous work in speech recognition in areas such as pronunciation modelling and search. The FST synthesizer has been incorporated into two servers which operate within our conversational system architecture to convert meaning representations into waveforms. We have had preliminary success with the new FST-based synthesis in several constrained spoken dialogue applications.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Flexible, Scalable Finite-state Transducer Architecture for Corpus-based Concatenative Speech Synthesis1

In this paper we describe our work involving the conversion of our phonologically-based synthesizer into a finite-state transducer (FST) representation which can be used for real-time natural-sounding synthesis. We have designed a transducer structure to efficiently perform the common task of unit selection in concatenative speech synthesis. By encapsulating domainindependent concatenative synt...

متن کامل

Joint prosody prediction and unit selection for concatenative speech synthesis

In this paper we describe how prosody prediction can be efficiently integrated with the unit selection process in a concatenative speech synthesizer under a weighted finite-state transducer (WFST) architecture. WFSTs representing prosody prediction and unit selection can be composed during synthesis, thus effectively expanding the space of possible prosodic targets. We implemented a symbolic pr...

متن کامل

Unit selection for speech synthesis using splicing costs with weighted finite state transducers

In this paper we describe how unit selection for concatenative speech synthesis can be implemented efficiently for sub-phonetic units using weighted finite state transducers (WFST). We also introduce splicing costs as a measure to indicate which unit boundaries are particularly good or poor joint points. Splicing costs extend the flexibility offered by the unit selection paradigm. Through a per...

متن کامل

Flexible Speech Synthesis Using Weighted Finite State Transducers

Flexible Speech Synthesis Using Weighted Finite State Transducers

متن کامل

FSM and k-nearest-neighbor for corpus based video-realistic audio-visual synthesis

In this paper we introduce a corpus based 2D videorealistic audio-visual synthesis system. The system combines a concatenative Text-to-Speech (TTS) System with a concatenative Text-to-Visual (TTV) System to an audio lipmovement synchronized Text-to-Audio-Visual-Speech System (TTAVS). For the concatenative TTS we are using a Finite State Machine approach to select non-uniform variablesize audio ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000